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FREQUENCY DISTRIBUTIONS OBTAINED BY CERTAIN TRANS- 
FORMATIONS OF NORMALLY DISTRIBUTED VARIATES.* 

By H. L. Rietz. 

The problem considered in this paper was first suggested to the writer 
by experiments with actual frequency distributions of various measure- 
ments of objects which approximate roughly to a set of similar solids. 
To be concrete, we may think of the diameters, surfaces, and volumes of 
spheres that represent objects in nature, such as oranges on a tree or 
peas on a plant. 

Suppose the distribution of diameters is a normal distribution given by 

J -(S-I)2 

It seems natural to inquire into the nature of the distribution of the 
corresponding surfaces and volumes. Conversely, we should ask for a 
determination of the distribution of diameters if we knew surfaces or 
volumes were normally distributed. The same kind of problemf would 
arise if we knew that velocities, v, of molecules of a gas were normally 
distributed, and were required to investigate the distribution of energy 
%mv 2 . These concrete illustrations are special cases of the transformation 
of variates of a normal distribution by replacing each variate, x, by an 
assigned function kx n , where A; is a positive constant and n is a positive 
integer or the reciprocal of a positive integer. Edgeworthf and Kapteyn§ 
have made use of transformations of the normal curve as a method of 
representing skew frequency distributions. Apart from the possible use 
for this purpose, the frequency curves arising from certain simple trans- 
formations of the variates of a normal distribution present points of 
special interest to which it seems that attention should be directed, 
particularly because of the striking differences in general appearance from 
normal curves — a fact that seems both interesting and important in 
forming a proper conception of the place of the normal curve in the 
representation of frequency distributions. 

It is the main purpose of the present paper to exhibit certain properties 

* Read before the American Mathematical Society at Lincoln, Nebraska, Nov. 27, 1920. 
t Edgeworth, Proc. Fifth International Congress of Mathematicians, II, p. 427. 
t Loc. cit. and a series of papers in the Journal of the Royal Statistical Society. See vol. 61, 
pp. 670-700. 

§ Skew Frequency Curves in Biology and Statistics, 1903. 
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of the frequency curves that are obtained when the variates of a normal 
distribution are transformed by substituting for each variate, x, the 
function kx n where k is a positive constant, and where suitable restric- 
tions will be placed on n as we proceed. The case n = 1 is treated by 
Bruns* and the results are simple and well known. Edgeworth called 
attention to the general form of the frequency curve with which we are 
concerned for n = 2. Furthermore, when deviations of variates from 
their mean value are small compared to their mean value, it is well known 
that the distributions of squares and cubes of variates approach normal 
distributions sufficiently near for certain purposes. But in certain im- 
portant statistical applications, the deviations of variates from their 
mean value cannot be reasonably regarded as small compared to the mean 
value. This latter class of distributions gives special importance to 
our problem. 

When k = 1, the problem is that of exhibiting the properties of the 
frequency distribution of the nth powers of a set of normally distributed 
variates. This case seems to include essentially the same points of 
interest contained in the more general problem, since the transformation 
x' = x n , followed by the linear transformation x" = kx', produces the 
same result as the transformation x" = kx n . Hence we shall in what 
follows deal with the transformation x' = x n . 

To determine the frequency function obtained by the transformation, 
let Xi, Xt, • ■ •, x t be a system of variates expressed in a unit equal to the 
standard deviation a, and belonging to the normal distribution 

J -(X-X)2 

V = -7= e 2 , x ^ 0, 

so that P = J ydx is the probability that a variate taken at random 

belongs to the interval a to b. 

Let us replace each variate x s by x,', where x,' = x, n . We then make 

a corresponding transformation of the integral I ydx by letting x' = x n 

(n9*0). Then 

dx' = nx n ~ l dx, except at x = when n < 1, 

and 

dx' 
d% = — sri> except at x' = when n > 1. 

nx'~ 



* Wahrscheinlichkeitsrechnung und Kollektivmasslehre, 1906, pp. 126-129. 
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We may therefore write 



( l -Y 



1 /•» -(*-»)« i fb" p 2 

P = -f= I e~*~dx = — ±= f e - . Ax', 



(1) 



where for the present we assume a > 0, and 6 > 0. As shown below, 
these limitations on a and 6 may be removed to some extent for certain 
values of n. 

The frequency curve of x'-variates obtained from positive x-variates 
is then given by 

( - -) 2 
v , _ 1 f^ 1 (2) 

nV27rx' " 

The function (2) does not represent a normal curve when n ■£ 1. 

In order to determine the general character of the frequency curve 
given by (2), we first examine the function for maxima and minima. For 
this purpose, we have 



( l -V 
dy ' _ 1 , — 2 f_ n - !„, » _1 



L n 



: x > -±(x' n - x)x' 



dx' nV2ir L n n 

The derivative changes signs at 

x' = 1 (x ± V** - 4(n - 1))", (3) 

when x 2 > 4(n — 1), and at x' = for certain values of n. 

In equation (1) we restricted o and b to be zero or positive, but when 
n is an odd positive integer or the reciprocal of an odd positive integer, 
it follows at once that (2) gives the frequency curve corresponding to 
negative values of x' that arise from the transformation x' = x n when x 
is negative. By taking x sufficiently large, the function (2) may be made 
as nearly zero as we please for negative values of x' except at points near 
the discontinuity at x' = 0. This discontinuity exists when n > 1. 
When n is an odd number or the reciprocal of an odd number, the deriva- 
tive dy'/dx' changes sign at x' = 0. When n is the reciprocal of an odd 
positive integer, there is a minimum at x' = 0, and the value of the func- 
tion is zero at this minimum. 

We shall find it convenient to consider the frequency curves given by 
(2) under three cases according asn>l,0<n<l, orn<0. We shall 



FREQUENCY DISTRIBUTIONS. 295 

limit our discussion to positive values of x and x' except when there is a 
specific statement extending the treatment to negative values. 

Case I. n > 1. 

The maximal frequency corresponds to the value of x' given by taking 
the positive sign before the radical in (3). In the language of statistics, 
the abscissa of this maximal frequency is called the modal value or the 
mode. We shall find it convenient to use these expressions later in this 
paper. The curve for n = 3, x = 4 is shown in Fig. 1 for positive 
values of x'. 




'<OlOM1O9OCQ7O600Om 



,-ili 



,(X<1<3_4)« 



Fig. 1. y' = 5l_e 2 , x' > 0. 
3V2x 

The skew appearance of the figure shows that the distribution is not 
even approximately a normal distribution. Since variates at the median 
of the original normal distribution must be transformed to the median 
of the new distribution, we may appropriately compare the value x n of 
the median of the new distribution with the modal value given by (3). 
When n > 1, it follows from (3) that the mode of the new distribution 
is less than its median. The minimum that corresponds to the value of 
x' given by taking the negative sign before the radical in (3) is of special 
interest because the existence of this minimum would probably not be 
expected by ordinary intuition. Thus, when x 2 > 4(n — 1), we have a 
minimum between the origin and the maximum discussed above. This 
minimum is shown in Fig. 1 at a point near the origin for the case n = 3, 
x = 4. The descent of the curve from infinity at x' = to the minimum 
at x' = (2 — V2) 3 is so rapid and the curve is so near the F-axis that it 
cannot be shown well on the scale of Fig. 1. For this reason we show the 
curve in the neighborhood of this minimum in Fig. 2 on an enlarged scale. 
The function y' (n an odd number) has real positive values when x' is 
negative, but the values differ very little from zero, except near x' = 0, 
when x = 4. For the case x = 4 shown in Fig. 1, the values of y' for 
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negative values of x' are so near zero, except at points near the discon- 
tinuity at x' = 0, that it is impractical to exhibit the curve on the scale 
of Fig. 1 for negative values of x'. 

Y 

.004.- 



.001 



o X' 

Fig. 2. Portion of curve of Fig. 1 from x' = to x' = 1 with enlarged horizontal scale. 

Next we find d 2 y'/dx' 2 . It turns out that the points of inflection are 
given by the solutions of the equation 



I_3 i 
x' n [_x' a -2xx' 



+ (3n - 4 + x 2 )x 



+ «(3 - Zn)x" + (n - l)(2n - 1)] = 0. (4) 

When n = 3, x = 4, one point of inflection is at x' = 1 as shown in 
Fig. 1. Furthermore, it is easily verified when x = n + 1, that (4) has 
a solution x' = 1, and that there is a point of inflection at x' = 1. There 
is also, in general, another point of inflection on the curve to the right of 
the maximum. 

The general appearance of the frequency curve (2) depends much on 
the value of x compared to 4(n — 1). When x 2 ^ 4(n — 1), the function 
(2) has no maximum nor minimum, but is a monotone decreasing function 
of x'. When x 2 = 4(n — 1), there is a point of inflection at x' = x"/2". 

The problem when n = 2 and x > 2 presents a point of special in- 
terest. Thus, if x were assigned larger and larger values, the x-coordinate 
of the minimum would approach zero and that of the maximum would 
approach the median x 2 . This is seen from the fact that 



l(x - Vx 2 - 4) 2 



and 



x 2 — \{x + Va 



4) 5 



are monotone decreasing functions of x. 

An analogous result holds for the minimum when n > 2, but it does 
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not hold for the maximum. Thus, when n has any assigned value > 2, 
the x-coordinate 

x' = i; (x - Vz 2 - 4(n - 1))" 

of the minimum is a monotone decreasing function of x, and approaches 
zero as a limit when x is increased indefinitely, but the mode does not, 
in general, approach the median x n as a limit when n is increased. How- 
ever, the ratio 

_L_(* + V* - 4(n - 1))- 

of the mode to the median approaches the limit 1 as I is increased in- 
definitely. 

The rapidity of approach to the limiting values depends on the small- 
ness of the ratio 4(n — 1) /x 2 . Hence, in order that the frequency curve 
may descend rapidly to a minimum in the neighborhood of the discon- 
tinuity at x' = 0, and in order that the mode shall be relatively near the 
median, it is necessary that 4(n — l)/z 2 shall be small. This condition 
is clearly necessary in order that the new frequency curve shall have 
roughly the appearance of a normal curve when we neglect the part 
of this curve which belongs to the interval from x' = to the minimum. 

Case II. < n < 1. 

In this case, make n = \\m, where to > 1. This case thus includes 
the distribution obtained by taking positive integral roots of a set of 
variates. We shall limit our considerations to the principal real values 
of the functions. 

The equation (2) may be written 

/m-l -(x' m -x)i 

V' = —]K=- e ' (5) 

V2x 

When n < 1, it follows from (3) that the mode is greater than the 
median of the new distribution. There is a minimum at x' = when 
to is an odd number > 1, and we have in this case a minimum given by 
the negative value of x' obtained from (3). If 4(n — l)/x 2 is small, the 
value of the function for x' < is too nearly zero to distinguish the curve 
from the x-axis when drawn on a scale suitable for reproduction on an 
ordinary page. Further, if 4(n — 1)1 5? is small, the curve for x' > 
may be described roughly as having the general appearance of a normal 
curve, but differing from the normal curve both in that it is somewhat 
skew, and in that y' = at a finite point. 

Case III. n < 0. 

Let n = — to. 
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Then the frequency curve becomes 

( - l -V 

-\x' m -x> 

V' = —A e 2 • (6) 

rax' V2tt 

By giving y' the value zero when x' = 0, the curve becomes continuous 
at the origin. 

The distribution has a modal value 

, = (Vg + 4(w + 1)- x) m 
2 m (ra + l) m 

This mode is less than the median l/x m . 

From the three cases examined relative to values of n, we may now 
state the theorem that the nth powers of a set of normally distributed positive 
variates give a distribution whose modal value is greater or less than its 
median according as the value of n is or is not between and 1. 

The simplicity of the examination of the frequency distributions 
obtained from a normal distribution by the transformation x = x n arises 
from the fact that the equation 

dx 

i 

has a quadratic factor in the variable x = x'* for which we solved to 
determine maxima and minima. 

The occurrence of this quadratic factor suggests the problem of 
finding other functions 

*' = fix) (7) 

which would lead to a quadratic equation in x, and in more special cases 
to a linear equation in x, for finding maxima and minima of the frequency 
distribution obtained by the transformation (7). 

Assume that (7) may be solved for x giving a single-valued function 

x = <p(x'). (8) 

Then the frequency curve of x'-variates is 

l -w*;>-*]' ds 

and 

gives the maxima and minima. 
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We may now seek the function that will make the equation 

%-(^) 2 (x-*) =0 (10) 

dx' 2 \dx'J 

have a quadratic factor in x. 

(dx V d 2 x 
) / — J3 
dx' / / dx' 

a linear function of x, say ex + c L . 
That is, 



were 



«- + *£ -(£)"• (u) 



, d 2 x 
dx' 2 



Let p = dxjdx'. Then (11) becomes 

dp . dp , 

and apart from the trivial solution x = a constant, we have the solutions 
x' = cJx + C jY~~ c = e 2 [x + ci(l - n)]», (12) 



and 



x' = c 3 log(x+^V (13) 



Thus we find that the logarithms of the variates as well as their powers 
are distributed in accord with frequency curves whose maxima and 
minima are easily obtained because of the quadratic factor in (10) when 
x' = log x. The frequency distribution for the transformation x' = log x 
is similar to that of the case < n < 1 discussed above in that the mode 
is greater than the median. 

(dx \ 2 / d 2 x 
^-7 ) / — - = Cu where Ci is a constant, the equation (10) 
dx J I dx' 

has, in general, a linear factor, and 

x 

x' = c 4 e" + c 5 . (14) 

Thus we find that a simple exponential transformation of variates 
leads to a linear factor in equation (10). 
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Another transformation that would, in general, lead to a quadratic 
factor in (10) is given by making 



d 2 x 
dx' 2 



= Ax 2 + Bx + C. 



(dx_\ 
\dx'J 

From this equation 

, _ r dx 

x — Ci l Ax3 Sxl 



which could hardly be regarded as a simple transformation in which we 
are likely to be interested unless A = B = 0, but this special case gives 
simply the transformation (14). 

The University of Iowa, 
Iowa Citt, Iowa. 



